A Comparative Study of Clustering Algorithms on Narrow - Domain

نویسندگان

  • David Pinto
  • Paolo Rosso
  • Alfons Juan
  • Héctor Jiménez-Salazar
چکیده

Clustering abstracts of scientific texts of very narrow domain implies a big challenge. The first problem to attend is the high overlapping among the document’s vocabularies, besides the low frequency of these terms. The transition point technique has been successfully used in this area of Natural Language Processing (NLP). Its best properties rely on the extraction of the mid-frequency terms. Although the importance of these terms on NLP has been known from time ago, the exact extraction of these terms is unknown. In this paper we present an application of this technique as a feature selection technique in two corpora of very narrow domain. The experimental results show that the transition point technique obtains the best results of F-measure with five different clustering methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Comparative Study of Some Clustering Algorithms on Shape Data

Recently, some statistical studies have been done using the shape data. One of these studies is clustering shape data, which is the main topic of this paper. We are going to study some clustering algorithms on shape data and then introduce the best algorithm based on accuracy, speed, and scalability criteria. In addition, we propose a method for representing the shape data that facilitates and ...

متن کامل

A Comparative Study between a Pseudo-Forward Equation (PFE) and Intelligence Methods for the Characterization of the North Sea Reservoir

This paper presents a comparative study between three versions of adaptive neuro-fuzzy inference system (ANFIS) algorithms and a pseudo-forward equation (PFE) to characterize the North Sea reservoir (F3 block) based on seismic data. According to the statistical studies, four attributes (energy, envelope, spectral decomposition and similarity) are known to be useful as fundamental attributes in ...

متن کامل

Improving Vehicular Ad-Hoc Network Stability Using Meta-Heuristic Algorithms

Vehicular ad-hoc network (VANET) is an important component of intelligent transportation systems, in which vehicles are equipped with on-board computing and communication devices which enable vehicle-to-vehicle communication. Consequently, with regard to larger communication due to the greater number of vehicles, stability of connectivity would be a challenging problem. Clustering technique as ...

متن کامل

A Framework for Optimal Attribute Evaluation and Selection in Hesitant Fuzzy Environment Based on Enhanced Ordered Weighted Entropy Approach for Medical Dataset

Background: In this paper, a generic hesitant fuzzy set (HFS) model for clustering various ECG beats according to weights of attributes is proposed. A comprehensive review of the electrocardiogram signal classification and segmentation methodologies indicates that algorithms which are able to effectively handle the nonstationary and uncertainty of the signals should be used for ECG analysis. Ex...

متن کامل

Assessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories

In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007